[UNIX] Sort lines of massive file by number of words on line (ideally in parallel)

Posted by conradlee on Stack Overflow See other posts from Stack Overflow or by conradlee
Published on 2010-03-17T21:50:10Z Indexed on 2010/03/17 22:01 UTC
Read the original article Hit count: 447

Filed under:

I am working on a community detection algorithm for analyzing social network data from Facebook. The first task, detecting all cliques in the graph, can be done efficiently in parallel, and leaves me with an output like this:

17118 17136 17392
17064 17093 17376
17118 17136 17356 17318 12345
17118 17136 17356 17283
17007 17059 17116

Each of these lines represents a unique clique (a collection of node ids), and I want to sort these lines in descending order by the number of ids per line. In the case of the example above, here's what the output should look like:

17118 17136 17356 17318 12345
17118 17136 17356 17283
17118 17136 17392
17064 17093 17376
17007 17059 17116

(Ties---i.e., lines with the same number of ids---can be sorted arbitrarily.)

What is the most efficient way of sorting these lines.

Keep the following points in mind:

The file I want to sort could be larger than the physical memory of the machine
Most of the machines that I'm running this on have several processors, so a parallel solution would be ideal
An ideal solution would just be a shell script (probably using sort), but I'm open to simple solutions in python or perl (or any language, as long as it makes the task simple)
This task is in some sense very easy---I'm not just looking for any old solution, but rather for a simple and above all efficient solution

Developer IT

[UNIX] Sort lines of massive file by number of words on line (ideally in parallel) - Developer IT

[UNIX] Sort lines of massive file by number of words on line (ideally in parallel)

unix

sort

parallel

lines

file

Related posts about unix

polkit: disable all users except those in group wheel?

How to find other end of unix socket connection?

Meaning of directories on Unix and Unix like systems

Easy understanding of UNIX and UNIX shell scripting!!

Is Mac OS X a licensed Unix or Unix-like clone that conforms to Unix specification?

Related posts about sort

Sort list using stl sort function

How is counting sort a stable sort?

Quick Sort Vs Merge Sort

Using selection sort in java to sort floats

org-sort multi: date/time (?d ?t) | priority (?p) | title (?a)

Categories cloud